Easy2Siksha.com
GNDU QUESTION PAPERS 2023
BA/BSc 6
th
SEMESTER
QUANTITATIVE TECHNIQUES
(Quantave Techniques – VI)
Time Allowed: 3 Hours Maximum Marks:100
Note: Aempt Five quesons in all, selecng at least One queson from each secon.
The Fih queson may be aempted from any secon.
All quesons carry equal marks.
SECTION – A
I. Discuss Ordinary Least Squares (OLS) method.
Fit a linear regression model to the following data taking X as the dependent variable:
X
50
45
70
75
90
55
100
120
135
130
Y
60
80
100
130
140
160
180
200
220
240
II.(a) Discuss the scope, nature and methodology of econometrics.
(b) Explain Simple Linear Regression Model.
SECTION – B
III.(a) Explain the Gauss–Markov Theorem.
(b) Dierenate between R² and Adjusted R².
Give their importance in regression analysis.
IV.(a) What is test of signicance?
A stenographer claims that she can take dictaon at the rate of 120 words per minute.
Can we reject her claim on the basis of 100 trials in which she demonstrates a mean of 116
Easy2Siksha.com
words with a standard deviaon of 15 words?
Use 5% level of signicance.
(b) Explain BLUE (Best Linear Unbiased Esmator).
SECTION-C
V. What is Mulcollinearity problem? What are the sources, conseque-nces and tests of
Mulcollinearity problem in regression analysis?
VL (a) What are the types and consequences of specicaon errors?
(b) Explain tests and remedial measures of heteroscedascity.
SECTION-D
VII. (a) Dierenate between Distributed Lag and Auto Regressive Models.
(b) Explian the sources and remedial measures of auto-correlaon problem in regression
analysis.
VIII. (a) Explain the uses of dummy variables.
(b) Explain the tests to detect the auto-correlaon problem in regression analysis.
Easy2Siksha.com
GNDU ANSWER PAPERS 2023
BA/BSc 6
th
SEMESTER
QUANTITATIVE TECHNIQUES
(Quantave Techniques – VI)
Time Allowed: 3 Hours Maximum Marks:100
Note: Aempt Five quesons in all, selecng at least One queson from each secon.
The Fih queson may be aempted from any secon.
All quesons carry equal marks.
SECTION – A
I. Discuss Ordinary Least Squares (OLS) method.
Fit a linear regression model to the following data taking X as the dependent variable:
X
50
45
70
75
90
55
100
120
135
130
Y
60
80
100
130
140
160
180
200
220
240
Ans: Part 1: Understanding the Ordinary Least Squares (OLS) Method
Imagine you have two variables that seem related. For example:
Study hours → Exam marks
Advertising → Sales
Rainfall → Crop yield
In statistics, we often want to describe this relationship with a straight line, called a
regression line.
But here comes the question:
󷷑󷷒󷷓󷷔 Among all possible straight lines, which one best fits the data?
This is exactly what the Ordinary Least Squares (OLS) method does.
Easy2Siksha.com
The basic idea of OLS
Suppose we want to predict a variable from another variable .
We assume a linear relationship:

Where:
= intercept (value of X when Y = 0)
= slope (how much X changes when Y increases by 1)
But real data never lies perfectly on a line. So each point has an error (difference between
actual X and predicted X).
OLS says:
󷷑󷷒󷷓󷷔 Choose the line for which the sum of squared errors is minimum.
Why squared errors?
Because:
Positive and negative errors don’t cancel
Large errors get penalized more
Mathematical solution becomes easy
So OLS literally means:
“Find the line that minimizes the total squared vertical distances between observed points
and the line.”
Part 2: Given Data
We are asked to take X as dependent variable and fit regression of X on Y.
X
70
75
90
55
100
120
135
130
Y
100
130
140
160
180
200
220
240
So our regression form is:

Easy2Siksha.com
Part 3: Formula for Regression of X on Y
OLS gives:
󰇛
󰇜󰇛
󰇜
󰇛
󰇜

So we need:
Mean of X
Mean of Y
Deviations
Step 1: Calculate Means

Sum of X = 50+45+70+75+90+55+100+120+135+130 = 870
 
Now Y:
Sum of Y = 60+80+100+130+140+160+180+200+220+240 = 1510
 
Step 2: Compute Deviations Table
We calculate:
Product
Square of Y deviation
(Condensed results)
󰇛
󰇜󰇛
󰇜 
Easy2Siksha.com
󰇛
󰇜

Step 3: Calculate Slope



Step 4: Calculate Intercept

 󰇛 󰇜
 

Final Regression Equation
 
Part 4: Interpretation (Very Important for Exams)
Now let’s understand what this line means.
󷄧󷄫 Slope (0.501)
If Y increases by 1 unit, X increases by about 0.5 units.
So X grows at roughly half the rate of Y.
󷄧󷄬 Intercept (11.34)
When Y = 0, predicted X ≈ 11.34
(This may not have practical meaning if Y cannot be zero, but mathematically it anchors the
line.)
Easy2Siksha.com
Part 5: Why OLS Regression is Useful
OLS regression helps us:
Predict values
Measure relationships
Understand trends
Forecast outcomes
Examples:
Income vs consumption
Height vs weight
Cost vs production
Part 6: Conceptual Visualization
Imagine plotting these points on graph paper:
Y on horizontal axis
X on vertical axis
Points scatter upward. OLS finds the best straight line through the cloud.
Not through every point but closest overall.
Part 7: Key Properties of OLS Regression
Students often remember these exam points:
1. Sum of residuals = 0
2. Mean of predicted X = mean of actual X
3. Line passes through point
󰇛
󰇜
4. Minimizes squared errors
5. Unique best linear fit
Final Answer (Exam Style)
Ordinary Least Squares (OLS) Method:
The OLS method is a statistical technique used to estimate the parameters of a linear
regression model. It determines the best-fitting line by minimizing the sum of squared
Easy2Siksha.com
differences between observed and predicted values of the dependent variable. This ensures
the most accurate linear representation of the relationship between variables.
Given the data and taking X as dependent variable, the regression of X on Y is:

Where:
󰇛
󰇜󰇛
󰇜
󰇛
󰇜


Hence, the regression equation is:
 
II.(a) Discuss the scope, nature and methodology of econometrics.
(b) Explain Simple Linear Regression Model.
Ans: (a) Scope, Nature, and Methodology of Econometrics
1. Nature of Econometrics
Econometrics is the branch of economics that uses mathematics, statistics, and economic
theory to analyze real-world data. In simple terms, it’s about testing economic ideas with
numbers.
Economics gives the theory (e.g., “higher income leads to higher consumption”).
Statistics provides the tools (e.g., regression analysis).
Econometrics combines them to check if the theory holds true in practice.
So, econometrics is not just abstract—it’s practical, bridging the gap between theory and
reality.
2. Scope of Econometrics
The scope of econometrics is vast, covering almost every area of economics:
Testing Hypotheses: For example, does education really increase wages?
Forecasting: Predicting GDP growth, inflation, or unemployment using past data.
Easy2Siksha.com
Policy Evaluation: Measuring the impact of government policies like subsidies or tax
cuts.
Business Applications: Firms use econometrics to forecast demand, set prices, or
evaluate marketing strategies.
Financial Markets: Econometrics helps analyze stock prices, interest rates, and risk.
In short, econometrics is the “laboratory” of economics—it tests ideas, predicts outcomes,
and guides decisions.
3. Methodology of Econometrics
The methodology of econometrics follows a systematic process:
1. Formulation of Economic Model
o Start with a theory. Example: Consumption depends on income.
o Express it mathematically: 󰇛󰇜.
2. Specification of Econometric Model
o Translate theory into an equation with parameters:

where is the error term.
3. Collection of Data
o Gather real-world data (income and consumption figures).
4. Estimation of Parameters
o Use statistical techniques (like regression) to estimate and .
5. Hypothesis Testing
o Test if the estimated parameters make sense. Is ? Does income really
increase consumption?
6. Forecasting and Policy Analysis
o Use the model to predict future consumption or evaluate policy impacts.
7. Validation
o Check if the model fits reality. If not, refine it.
Example: If the model predicts that a 10% rise in income increases consumption by 8%,
policymakers can use this to design economic strategies.
(b) Simple Linear Regression Model
Now let’s move to the Simple Linear Regression Model, which is the most basic yet
powerful tool in econometrics.
1. Definition
A simple linear regression model studies the relationship between two variables:
One dependent variable (the outcome we want to explain).
Easy2Siksha.com
One independent variable (the factor we think influences the outcome).
Mathematically:

= dependent variable (e.g., consumption).
= independent variable (e.g., income).
= intercept (value of when ).
= slope (change in when increases by 1 unit).
= error term (captures other influences not included in the model).
2. Estimation
Econometricians use the Ordinary Least Squares (OLS) method to estimate and .
OLS finds the line that best fits the data points by minimizing the sum of squared
errors.
In simple terms, it draws the “best straight line” through the scatter plot of data.
3. Interpretation
Suppose we estimate:
 
Intercept ( ): Even if income is zero, consumption is 50 (basic survival
spending).
Slope ( ): For every extra unit of income, consumption increases by 0.8 units.
This tells us how strongly income influences consumption.
4. Assumptions of the Model
For regression results to be valid, certain assumptions must hold:
Linear relationship between and .
Error term has zero mean.
No correlation between and error term.
Constant variance of errors (homoscedasticity).
Errors are independent.
If these assumptions are violated, results may be biased or misleading.
5. Applications
Economics: Relationship between education and wages.
Business: Impact of advertising on sales.
Easy2Siksha.com
Health: Effect of exercise on weight loss.
Finance: Link between interest rates and investment.
Conclusion
Econometrics is the science of testing and applying economic theories with data. Its scope
covers everything from policy evaluation to business forecasting. Its methodology is
systematicstarting with theory, building models, estimating parameters, and validating
results.
The Simple Linear Regression Model is the foundation of econometrics. By studying the
relationship between two variables, it helps us quantify economic ideas. Though simple, it is
powerful, forming the basis for more complex models.
SECTION – B
III.(a) Explain the Gauss–Markov Theorem.
(b) Dierenate between R² and Adjusted R².
Give their importance in regression analysis.
Ans: III.(a) GaussMarkov Theorem Simple Explanation
Imagine you are trying to predict a student’s marks based on the number of hours they
study. You collect data from many students and draw a straight line that best fits the data.
This line is called the regression line.
But now a question arises:
󷷑󷷒󷷓󷷔 Is this the best possible line we can draw?
󷷑󷷒󷷓󷷔 Or could some other method give better estimates?
This is exactly where the GaussMarkov Theorem comes in.
󽆤 What the GaussMarkov Theorem Says (in simple words)
The theorem states:
If certain basic assumptions of regression are satisfied, then the Ordinary Least Squares
(OLS) regression estimator is the Best Linear Unbiased Estimator (BLUE).
Let’s understand this slowly.
Easy2Siksha.com
Step-by-step meaning of “Best Linear Unbiased Estimator (BLUE)”
󷄧󷄫 Linear
The estimates are calculated using a linear equation (straight-line model).
Example:

Here, we assume the relationship between variables is linear.
󷄧󷄬 Unbiased
An estimator is unbiased if, on average, it gives the correct value.
Think like this:
If you repeatedly estimate the effect of study hours on marks using different samples of
students, the average estimate will equal the true effect.
So OLS does not systematically overestimate or underestimate.
󷄧󷄭 Best
“Best” here means minimum variance.
Imagine many students draw regression lines from different samples.
Some lines fluctuate a lot (unstable estimates), others are consistent.
The GaussMarkov theorem says:
󷷑󷷒󷷓󷷔 Among all unbiased linear estimators, OLS estimates vary the least
󷷑󷷒󷷓󷷔 So they are the most reliable
󽆤 Conditions required (assumptions)
GaussMarkov works only if certain conditions hold:
1. Linear relationship between variables
2. Errors have mean = 0
3. Errors have constant variance (homoscedasticity)
Easy2Siksha.com
4. Errors are uncorrelated
5. No perfect multicollinearity
If these assumptions are satisfied → OLS is BLUE.
󽆤 Importance of GaussMarkov Theorem
This theorem is extremely important in regression analysis because:
It justifies using OLS method
It proves OLS is statistically efficient
It ensures reliable coefficient estimates
It builds foundation of econometrics
󷷑󷷒󷷓󷷔 Without Gauss–Markov, we wouldn’t know whether OLS is trustworthy.
So the theorem basically tells us:
“If your regression assumptions are correct, then OLS is the best method you can use.”
III.(b) Difference between R² and Adjusted R²
Now let’s move to the second part.
When we run regression, we want to know:
󷷑󷷒󷷓󷷔 How well does the model explain the data?
For this, we use and Adjusted R².
󽆤 R² (Coefficient of Determination)
R² tells us:
How much of the variation in the dependent variable is explained by the independent
variables.
Example:
If R² = 0.80 → 80% of variation in marks is explained by study hours.
So R² measures goodness of fit.
Easy2Siksha.com
󽆤 Formula idea (conceptual)
Explained variation
Total variation
Range:
0 ≤ R² ≤ 1
0 → model explains nothing
1 → perfect explanation
󽆤 Problem with R²
Here is the catch:
󷷑󷷒󷷓󷷔always increases when you add more variables
Even useless variables increase R² slightly.
Example:
Marks = Study hours + Shoe size
Shoe size is irrelevant, but R² may still rise.
So R² can mislead us.
󽆤 Adjusted R² Improved Version
Adjusted R² fixes this problem.
It adjusts for:
Number of variables
Sample size
So it only increases if new variables actually improve the model.
󽆤 Key idea
Penalizes unnecessary variables
Rewards meaningful predictors
Easy2Siksha.com
So Adjusted R² is more realistic.
󽆤 Main Difference Between R² and Adjusted R²
Feature
Adjusted R²
Meaning
% of variation explained
Corrected % of variation
Effect of adding variables
Always increases
May increase or decrease
Penalty for useless variables
No
Yes
Reliability
Less
More
Usefulness
Basic fit measure
True model quality
󽆤 Importance in Regression Analysis
Both R² and Adjusted R² are important tools.
Importance of R²
Measures model fit
Shows explanatory power
Easy to interpret
Useful for comparison
Importance of Adjusted R²
Prevents overfitting
Helps select correct variables
Gives realistic model accuracy
Preferred in multiple regression
󽆤 Real-life Understanding
Imagine you are predicting income based on:
Education
Experience
Age
Height
Favorite color
Easy2Siksha.com
If you add many irrelevant variables:
󷷑󷷒󷷓󷷔 R² will increase
󷷑󷷒󷷓󷷔 Adjusted R² will fall
So Adjusted R² tells the truth.
󽆤 Final Summary
GaussMarkov Theorem:
It states that under classical regression assumptions, the OLS estimator is the Best Linear
Unbiased Estimator (BLUE), meaning it has minimum variance among all unbiased linear
estimators. This theorem justifies the use of OLS in regression analysis and ensures efficient
and reliable coefficient estimation.
R²:
It measures the proportion of variation in the dependent variable explained by independent
variables. It indicates goodness of fit but always increases when variables are added.
Adjusted R²:
It is a modified form of R² that adjusts for number of predictors and sample size. It penalizes
irrelevant variables and provides a more accurate measure of model quality.
Importance:
R² and Adjusted R² help evaluate regression models, compare alternative models, detect
overfitting, and select meaningful predictors, thereby improving the reliability of statistical
analysis.
IV.(a) What is test of signicance?
A stenographer claims that she can take dictaon at the rate of 120 words per minute.
Can we reject her claim on the basis of 100 trials in which she demonstrates a mean of 116
words with a standard deviaon of 15 words?
Use 5% level of signicance.
(b) Explain BLUE (Best Linear Unbiased Esmator).
Ans: (a) Test of Significance
What is a Test of Significance?
Easy2Siksha.com
A test of significance is a statistical method used to decide whether the observed data
provides enough evidence to reject a claim (hypothesis) about a population. In simple
words, it helps us check if the difference we see in data is real or just due to chance.
There are two key hypotheses:
Null Hypothesis (H₀): The claim we want to test.
Alternative Hypothesis (H₁): The opposite of the claim, which we accept if the data
strongly contradicts H₀.
We then calculate a test statistic and compare it with critical values (based on probability
levels like 5%). If the test statistic falls in the rejection region, we reject H₀.
The Stenographer Example
Claim: A stenographer says she can take dictation at 120 words per minute.
Data from 100 trials:
Sample mean = 116 words per minute
Standard deviation = 15 words
Sample size (n) = 100
Significance level = 5%
Step 1: State Hypotheses
H₀: μ = 120 (Her average speed is 120 words/minute).
H₁: μ ≠ 120 (Her average speed is not 120 words/minute).
This is a two-tailed test because we are checking for any difference (not just slower or
faster).
Step 2: Calculate Test Statistic
We use the z-test because the sample size is large (n = 100).
Formula:
Where:
(sample mean)
(claimed mean)
(standard deviation)

Easy2Siksha.com
 







Step 3: Critical Value at 5% Level
For a two-tailed test at 5% significance:
Critical z-values = ±1.96
Step 4: Decision
Calculated z = -2.67
Since -2.67 < -1.96, it falls in the rejection region.
Conclusion: We reject the stenographer’s claim. The data shows her average speed is
significantly different (lower) than 120 words per minute.
Why This Matters
Tests of significance are widely used in economics, medicine, and social sciences to check
claims. In this case, it helps us objectively evaluate performance rather than relying on
personal statements.
(b) BLUE Best Linear Unbiased Estimator
Now let’s move to the second part: BLUE.
What is BLUE?
In econometrics, when we estimate parameters (like slope and intercept in regression), we
want our estimates to be:
Best: Minimum variance (most precise).
Linear: Based on a linear function of observed data.
Unbiased: On average, the estimate equals the true value.
Estimator: A rule or formula used to calculate the parameter.
The Ordinary Least Squares (OLS) method is considered BLUE under certain conditions.
Why OLS is BLUE (Gauss-Markov Theorem)
The Gauss-Markov Theorem states that under classical assumptions (like linearity, no
autocorrelation, constant variance of errors, and zero mean of errors), the OLS estimator is
the Best Linear Unbiased Estimator.
Easy2Siksha.com
Linear: OLS estimates are linear functions of the dependent variable.
Unbiased: Expected value of the estimator equals the true parameter.
Best: Among all linear unbiased estimators, OLS has the smallest variance, meaning
it is the most efficient.
Example of BLUE in Regression
Suppose we estimate the relationship between income (X) and consumption (Y):

OLS gives us estimates of and .
If assumptions hold, these estimates are unbiased (on average correct).
They are linear combinations of observed values.
They have minimum variance compared to other linear unbiased methods.
Thus, OLS is BLUE.
Conclusion
Test of Significance: Helps us decide whether to accept or reject a claim based on
data. In the stenographer’s case, her claim of 120 words/minute was rejected
because the observed mean (116) was significantly lower at the 5% level.
BLUE: Refers to the desirable properties of OLS estimators in regression. They are
Best (minimum variance), Linear, and Unbiased, making them reliable tools for
econometric analysis.
In short, tests of significance allow us to judge claims with evidence, while BLUE ensures our
regression estimates are trustworthy and efficient. Together, they form the backbone of
statistical and econometric reasoning.
SECTION-C
V. What is Mulcollinearity problem? What are the sources, conseque-nces and tests of
Mulcollinearity problem in regression analysis?
Ans: What is the Multicollinearity Problem?
Imagine you want to study how education and experience affect a person’s salary. So you
collect data and run a regression model:

󰇛󰇜
󰇛󰇜
Easy2Siksha.com
Now suppose in your data, people who have more education also usually have more
experience. In other words, education and experience move together.
Because of this, your regression model gets confused. It cannot clearly separate how much
salary increase comes from education and how much comes from experience.
󷷑󷷒󷷓󷷔 This confusion in regression due to high correlation among independent variables is
called multicollinearity.
Simple Definition
Multicollinearity is a situation in regression analysis where two or more independent
variables are highly correlated with each other.
So instead of each variable giving unique information, they start overlapping.
Why Multicollinearity is a Problem (Intuition)
Think of regression like a team project.
Each independent variable should bring different skills.
But if two variables bring the same skill, the teacher cannot decide who contributed what.
That’s exactly what happens in multicollinearity:
󷷑󷷒󷷓󷷔 Regression cannot distinguish individual effects clearly.
Sources (Causes) of Multicollinearity
Multicollinearity doesn’t appear randomly. It usually comes from certain patterns in data or
model design.
1. Variables measuring similar concepts
Sometimes we include variables that represent almost the same thing.
Example:
Income
Consumption
Wealth
These are closely related economically. So they move together.
Easy2Siksha.com
2. Derived or constructed variables
Sometimes we create variables from others.
Example:
Total income
Wage income
Non-wage income
Since
Total income = Wage + Non-wage
they will obviously be correlated.
3. Time trend in data
In time series data, many variables grow over time.
Example:
GDP
Population
Investment
Consumption
All increase year by year → high correlation → multicollinearity.
4. Dummy variable trap
When using categorical variables incorrectly.
Example:
Gender:
Male = 1 if male
Female = 1 if female
If both are included with intercept → perfect multicollinearity
because:
Male + Female = 1 always.
5. Small or limited sample data
Easy2Siksha.com
When sample size is small, variables may accidentally appear highly correlated.
Consequences (Effects) of Multicollinearity
Now let’s see why multicollinearity is dangerous for regression results.
1. Coefficients become unstable
Small change in data → big change in coefficients.
Example:
One regression: Education effect = 2000
Another regression: Education effect = 500
This instability is due to multicollinearity.
2. Signs may become wrong
Economic theory says effect should be positive, but regression shows negative.
Example:
Income → consumption should be positive
But multicollinearity may show negative coefficient.
3. Standard errors become large
Because regression is confused, uncertainty increases.
So standard errors rise.
4. Insignificant t-tests despite high R²
This is a classic symptom.
You may see:
R² very high (model fits well overall)
But individual variables insignificant
Easy2Siksha.com
Why? Because variables overlap in explaining variation.
5. Difficult interpretation
We cannot confidently say which variable truly affects dependent variable.
Types of Multicollinearity
1. Perfect multicollinearity
Exact linear relationship.
Example:
X₃ = X₁ + X₂
Regression cannot even be estimated.
2. Imperfect multicollinearity
High but not exact correlation.
Regression runs, but results unreliable.
This is most common.
Tests for Multicollinearity
Now the practical question:
How do we detect multicollinearity?
1. Correlation Matrix Method
Check correlation among independent variables.
If correlation > 0.8 or 0.9 → multicollinearity likely.
Example:
Corr(Education, Experience) = 0.92 → problem.
Easy2Siksha.com
Limitation:
Only detects pairwise correlation, not group correlation.
2. Variance Inflation Factor (VIF)
Most popular and reliable test.
Formula idea:
Measures how much variance of coefficient increases due to correlation.
Rule of thumb:
VIF = 1 → no multicollinearity
VIF > 5 → moderate
VIF > 10 → serious multicollinearity
3. Tolerance Test
Tolerance = 1 / VIF
Rule:
Tolerance < 0.1 → multicollinearity problem.
4. High R² but Low t-values
If model R² high but individual variables insignificant → suspect multicollinearity.
This is called Klein’s rule of thumb.
5. Eigenvalue / Condition Index Method
Advanced method used in econometrics software.
Rule:
Condition index > 30 → severe multicollinearity.
How to Remove or Reduce Multicollinearity
Easy2Siksha.com
Students often ask this too 󺋿󺋼󺌀󺋽󺋾
1. Remove one of correlated variables
If Education and Experience highly correlated, keep only one.
2. Combine variables
Create index or composite variable.
Example:
Socioeconomic status index.
3. Increase sample size
More data reduces correlation noise.
4. Use first differences (time series)
Removes trend-based correlation.
5. Centering variables
Subtract mean from variables.
Helps especially with interaction terms.
Final Conceptual Summary
󷷑󷷒󷷓󷷔 Multicollinearity = independent variables highly correlated
󷷑󷷒󷷓󷷔 Causes confusion in estimating individual effects
󷷑󷷒󷷓󷷔 Leads to unstable, unreliable coefficients
󷷑󷷒󷷓󷷔 Detected by correlation, VIF, tolerance, etc.
Easy2Siksha.com
VL (a) What are the types and consequences of specicaon errors?
(b) Explain tests and remedial measures of heteroscedascity.
Ans: (a) Types and Consequences of Specification Errors
What is a Specification Error?
A specification error occurs when the econometric model we build does not correctly
represent the true relationship between variables. In simple words, it’s like writing the
wrong recipe for a dishyou may leave out an ingredient, add the wrong one, or measure
incorrectly. The result will not match reality.
Types of Specification Errors
1. Omission of Relevant Variables
o Leaving out a variable that actually influences the dependent variable.
o Example: Studying wages based only on education, while ignoring work
experience.
o Consequence: The effect of omitted variables may wrongly get absorbed into
the included ones, leading to biased estimates.
2. Inclusion of Irrelevant Variables
o Adding variables that do not affect the dependent variable.
o Example: Including shoe size in a wage equation.
o Consequence: Estimates remain unbiased but become inefficient (higher
variance).
3. Incorrect Functional Form
o Using the wrong mathematical relationship.
o Example: Assuming a linear relationship when the true relationship is
quadratic.
o Consequence: Predictions become misleading, and estimates may be biased.
4. Measurement Errors
o Using inaccurate data for variables.
o Example: Recording income incorrectly or using approximate figures.
o Consequence: Leads to biased and inconsistent estimates.
5. Simultaneity or Wrong Causal Direction
o Mis-specifying cause and effect.
o Example: Modeling consumption as causing income, instead of income
causing consumption.
o Consequence: Results become unreliable due to endogeneity.
Consequences of Specification Errors
Biased Estimates: Wrong conclusions about relationships.
Inefficient Estimates: Larger standard errors, less precision.
Invalid Hypothesis Testing: t-tests and F-tests may give misleading results.
Poor Forecasting: Predictions fail to match reality.
Policy Misguidance: Wrong models can lead to flawed economic policies.
Easy2Siksha.com
In short: Specification errors distort the truth, making econometric analysis unreliable.
(b) Tests and Remedial Measures of Heteroscedasticity
What is Heteroscedasticity?
In regression analysis, heteroscedasticity occurs when the variance of the error term is not
constant across observations.
Homoscedasticity: Errors have equal variance (ideal case).
Heteroscedasticity: Errors vary with the level of the independent variable.
Example: In income vs. consumption data, richer households may show more variation in
spending than poorer ones.
Why is Heteroscedasticity a Problem?
OLS estimates remain unbiased, but they are no longer efficient.
Standard errors become unreliable, leading to incorrect hypothesis testing.
Confidence intervals and test statistics (t, F) lose validity.
Tests for Heteroscedasticity
1. Graphical Method
o Plot residuals against predicted values.
o If the spread increases or decreases systematically, heteroscedasticity is
present.
2. Breusch-Pagan Test
o Tests whether variance of errors is related to independent variables.
o A significant result indicates heteroscedasticity.
3. White’s Test
o A general test that does not require specifying the form of heteroscedasticity.
o Detects both heteroscedasticity and model misspecification.
4. Goldfeld-Quandt Test
o Splits data into two groups and compares error variances.
o Useful when heteroscedasticity is suspected to increase with certain
variables.
Remedial Measures
1. Transforming Variables
o Use logarithms or square roots to stabilize variance.
o Example: Taking log of income in regression models.
2. Weighted Least Squares (WLS)
o Assign weights to observations inversely proportional to error variance.
o This restores efficiency of estimates.
3. Robust Standard Errors
o Adjust standard errors to account for heteroscedasticity.
Easy2Siksha.com
o Estimates remain unbiased, and hypothesis testing becomes valid again.
4. Model Redesign
o Sometimes heteroscedasticity arises due to omitted variables or wrong
functional form.
o Correcting specification errors can reduce heteroscedasticity.
Conclusion
Specification Errors: These occur when the econometric model is wrongly
designedby omitting relevant variables, including irrelevant ones, using wrong
functional forms, or mismeasuring data. The consequences are serious: biased
estimates, poor forecasts, and misleading policy advice.
Heteroscedasticity: This problem arises when error variance is not constant. It
makes OLS inefficient and hypothesis testing unreliable. Tests like Breusch-Pagan,
White’s, and Goldfeld-Quandt help detect it, while remedies include variable
transformation, weighted least squares, and robust standard errors.
In short, econometrics is powerful only when models are correctly specified and
assumptions hold true. Specification errors and heteroscedasticity remind us that careful
design, testing, and correction are essential for trustworthy results.
SECTION-D
VII. (a) Dierenate between Distributed Lag and Auto Regressive Models.
(b) Explian the sources and remedial measures of auto-correlaon problem in regression
analysis.
Ans: VII (a) Difference between Distributed Lag and Auto-Regressive Models
Imagine you are studying how rainfall affects crop production. Now think carefully: does
rainfall affect crops only in the same year? Or can rainfall from previous years also influence
soil moisture and crop yield?
󷷑󷷒󷷓󷷔 Obviously, past rainfall also matters.
This is where lag models come into regression analysis.
There are two important types:
Distributed Lag Model (DLM)
Auto-Regressive Model (AR)
Let’s understand both through a simple narrative.
Easy2Siksha.com
1. Distributed Lag Model (DLM)
A Distributed Lag Model assumes that the current value of a dependent variable depends
on the current and past values of another independent variable.
In simple words:
󷷑󷷒󷷓󷷔 “Today’s result depends not only on today’s cause but also on past causes.”
Example
Suppose we study how advertising affects sales.
This month’s advertising increases sales now
Last month’s advertising still influences customers
Even advertising from two months ago may affect brand recall
So sales today depend on advertising of several past months.
This spread-out influence is called distributed lag.
Mathematically (simple idea):
Salesₜ = a + b₀Adₜ + b₁Adₜ₋₁ + b₂Adₜ₋₂ + error
Here:
Adₜ = current advertising
Adₜ₋₁ = last month advertising
Adₜ₋₂ = two months ago advertising
󷷑󷷒󷷓󷷔 The effect of advertising is distributed over time.
2. Auto-Regressive Model (AR)
Now imagine another situation:
Suppose we study income of a person.
Does current income depend only on external factors?
No it also depends on past income.
If someone earned ₹50,000 last year, their income this year will likely be related to that
level.
So here:
󷷑󷷒󷷓󷷔 “Today’s value depends on its own past values.”
This is called an Auto-Regressive Model.
Easy2Siksha.com
Example equation:
Incomeₜ = a + b₁Incomeₜ₋₁ + b₂Incomeₜ₋₂ + error
Here:
Current income depends on past income
The variable explains itself over time
Key Differences (Easy Comparison)
Basis
Distributed Lag Model
Auto-Regressive Model
Meaning
Current value depends on present & past
values of another variable
Current value depends on its
own past values
Focus
Effect of independent variable over time
Persistence of dependent
variable
Example
Sales depends on past advertising
Income depends on past
income
Use
Policy impact, marketing, economics
Time series forecasting
Nature
External lag effect
Internal lag effect
󷷑󷷒󷷓󷷔 Simple memory trick:
Distributed Lag = Past of X affects Y
Auto-Regressive = Past of Y affects Y
VII (b) Sources and Remedial Measures of Autocorrelation in Regression
Now let’s move to the second part — autocorrelation.
Think of autocorrelation like this:
Suppose you record daily temperature.
If today is hot, tomorrow is also likely hot.
So errors in regression are not independent they are related over time.
This is called autocorrelation (or serial correlation).
󷷑󷷒󷷓󷷔 Definition (simple):
Autocorrelation occurs when regression errors are correlated with each other across time.
Easy2Siksha.com
Sources (Causes) of Autocorrelation
Let’s understand why this problem happens.
1. Omitted Variables
Sometimes an important variable affecting Y is missing from the model.
Example:
Crop yield depends on rainfall AND soil fertility.
If we include rainfall but ignore soil fertility, the effect appears in errors.
Since soil fertility changes slowly over time, errors become correlated.
2. Wrong Functional Form
If the true relationship is nonlinear but we assume linear regression, residuals show
patterns.
Example:
Population growth is exponential, not linear.
Using linear regression causes systematic errors → autocorrelation.
3. Data Smoothing or Aggregation
Economic data like GDP, inflation, income often change gradually.
So consecutive observations are naturally related.
Example:
Monthly inflation this month ≈ last month’s inflation.
4. Time-Series Nature of Data
Autocorrelation is very common in:
GDP
Sales
Production
Prices
Income
Easy2Siksha.com
Because these evolve over time continuously.
5. Measurement Errors
If data collection method is consistent but biased, errors carry over across periods.
Example:
Survey method overestimates income every year similarly.
Why Autocorrelation is a Problem?
If autocorrelation exists:
OLS estimates remain unbiased
BUT standard errors become wrong
Hypothesis tests become unreliable
t and F tests misleading
So regression conclusions may be incorrect.
Remedial Measures of Autocorrelation
Now let’s see how economists/statisticians fix this problem.
1. Include Missing Variables
If omitted factors cause autocorrelation, add them.
Example:
Add soil fertility in crop model
Add interest rate in investment model
This often reduces serial correlation.
2. Use Lagged Variables
If dependent variable depends on past values, include lag.
Easy2Siksha.com
Example:
Consumptionₜ = a + bIncomeₜ + cConsumptionₜ₋₁
This converts model into autoregressive form.
3. Transform the Data (Differencing)
Take change instead of level.
Instead of:
Incomeₜ
Use:
ΔIncome = Incomeₜ − Incomeₜ₋₁
This removes trend and serial correlation.
Very common in time-series econometrics.
4. Generalized Least Squares (GLS)
When autocorrelation exists, OLS assumptions break.
GLS corrects covariance structure of errors.
Famous methods:
Cochrane-Orcutt method
Prais-Winsten method
These adjust regression to remove serial correlation.
5. Increase Data Frequency or Quality
Better data reduces systematic correlation.
Example:
Use weekly instead of yearly data
Improve measurement accuracy
Simple Intuitive Summary
Easy2Siksha.com
Let’s summarize everything in a story-like way:
Distributed Lag Model → past causes affect present outcome
Auto-Regressive Model → past outcome affects present outcome
Autocorrelation → regression errors are related over time
Causes → missing variables, wrong model, time-series nature
Remedies → add variables, use lags, difference data, GLS
Final Easy Memory Tips
Distributed Lag Xₜ₋₁ Y
Auto-Regressive Yₜ₋₁ Y
Autocorrelation e related to eₜ₋₁
VIII. (a) Explain the uses of dummy variables.
(b) Explain the tests to detect the auto-correlaon problem in regression analysis.
Ans: (a) Uses of Dummy Variables
What are Dummy Variables?
Dummy variables are artificial variables created to represent categories or qualitative
attributes in regression models. They take values like 0 or 1 to indicate the presence or
absence of a particular condition.
Example: If we want to study wage differences between men and women, we can create a
dummy variable:
Male = 1
Female = 0
This way, gender (a qualitative factor) can be included in a regression equation.
Uses of Dummy Variables
1. Representing Qualitative Data
o They allow us to include categorical factors like gender, region, occupation,
or education level in regression models.
o Without dummy variables, regression would only handle numerical data.
2. Measuring Group Differences
o Dummy variables help compare outcomes across groups.
Easy2Siksha.com
o Example: Wage differences between urban (1) and rural (0) workers.
3. Capturing Structural Changes
o They can represent policy changes, reforms, or events.
o Example: A dummy variable for years after economic liberalization (1 = post-
reform, 0 = pre-reform).
4. Seasonal Effects in Time Series
o Dummy variables can capture seasonal patterns.
o Example: Quarterly sales data with dummies for Q1, Q2, Q3, Q4.
5. Interaction Effects
o Dummy variables can interact with continuous variables to measure
differential impacts.
o Example: Effect of education on wages may differ for men and women;
interaction terms capture this.
Importance
Dummy variables make regression models more realistic by including qualitative aspects of
human behavior, policy, and environment. They bridge the gap between numbers and
categories, allowing richer analysis.
(b) Tests to Detect Auto-Correlation Problem in Regression Analysis
What is Auto-Correlation?
Auto-correlation occurs when error terms in a regression model are correlated across
observations, especially in time series data.
Ideal Case (No Auto-Correlation): Errors are independent.
Problem Case (Auto-Correlation): Errors in one period are related to errors in
another.
Example: In GDP growth data, if this year’s error is linked to last year’s error, auto-
correlation exists.
Why is Auto-Correlation a Problem?
OLS estimates remain unbiased, but they are no longer efficient.
Standard errors are distorted, making hypothesis tests unreliable.
Confidence intervals and t-tests lose validity.
Tests to Detect Auto-Correlation
1. Graphical Method
o Plot residuals against time.
o If patterns (like cycles or trends) appear, auto-correlation may exist.
2. Durbin-Watson Test
o Most widely used test for first-order auto-correlation.
o Statistic ranges between 0 and 4:
Easy2Siksha.com
Around 2 → No auto-correlation.
Less than 2 → Positive auto-correlation.
Greater than 2 → Negative auto-correlation.
3. Breusch-Godfrey Test
o More general test, useful for higher-order auto-correlation.
o Based on regression of residuals on lagged values.
4. Runs Test
o Checks randomness of residuals.
o Too few or too many runs (sequences of positive/negative residuals) suggest
auto-correlation.
Remedies for Auto-Correlation
1. Transforming the Model
o Use lagged dependent variables or difference equations.
2. Generalized Least Squares (GLS)
o Adjusts estimation to account for correlation in errors.
3. Cochrane-Orcutt Procedure
o Iterative method to correct first-order auto-correlation.
4. Newey-West Standard Errors
o Adjusts standard errors to remain valid even with auto-correlation.
Conclusion
Dummy Variables: These are powerful tools to include qualitative factors in
regression. They help measure group differences, capture policy changes, seasonal
effects, and interaction impacts. Without them, regression would miss important
non-numeric influences.
Auto-Correlation: This problem arises when error terms are correlated across time
or observations. It makes OLS inefficient and hypothesis testing unreliable. Tests like
Durbin-Watson, Breusch-Godfrey, and Runs Test help detect it, while remedies
include GLS, Cochrane-Orcutt, and robust standard errors.
This paper has been carefully prepared for educaonal purposes. If you noce any
mistakes or have suggesons, feel free to share your feedback.